Crystallizing short-read assemblies around lone Sanger reads

نویسندگان

  • Sajjad Hossain
  • Navid Azimi
  • Steven Skiena
چکیده

New short-read sequencing technologies produce large volumes of 25-30 base paired-end reads. In this paper, we present a sequencing protocol and de novo assembler program (SHORTY) targeted towards such microread data. Our protocol augments short-paired reads using a trivially small number of Sanger reads (only one to three reads per bacterial genome). Still, these “seed reads” enable us to produce significant assemblies using about half the short-read coverage (50-60X) of comparable assemblers, despite our assumption of base error rates at least 10 times that of other groups. SHORTY exploits two new ideas which we believe to be of interest to the shortread assembly community: (1) using single seed reads to crystalize assemblies, and (2) estimating intercontig distances accurately from multiple spanning paired-end reads. Contact: [email protected]

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Short read fragment assembly of bacterial genomes.

In the last year, high-throughput sequencing technologies have progressed from proof-of-concept to production quality. While these methods produce high-quality reads, they have yet to produce reads comparable in length to Sanger-based sequencing. Current fragment assembly algorithms have been implemented and optimized for mate-paired Sanger-based reads, and thus do not perform well on short rea...

متن کامل

De novo PacBio long-read and phased avian genome assemblies correct and add to reference genes generated with intermediate and short reads

Reference-quality genomes are expected to provide a resource for studying gene structure, function, and evolution. However, often genes of interest are not completely or accurately assembled, leading to unknown errors in analyses or additional cloning efforts for the correct sequences. A promising solution is long-read sequencing. Here we tested PacBio-based long-read sequencing and diploid ass...

متن کامل

Improved assembly of noisy long reads by k-mer validation.

Genome assembly depends critically on read length. Two recent technologies, from Pacific Biosciences (PacBio) and Oxford Nanopore, produce read lengths >20 kb, which yield de novo genome assemblies with vastly greater contiguity than those based on Sanger, Illumina, or other technologies. However, the very high error rates of these two new technologies (∼15% per base) makes assembly imprecise a...

متن کامل

Genome Sequencing and Assembly by Long Reads in Plants

Plant genomes generated by Sanger and Next Generation Sequencing (NGS) have provided insight into species diversity and evolution. However, Sanger sequencing is limited in its applications due to high cost, labor intensity, and low throughput, while NGS reads are too short to resolve abundant repeats and polyploidy, leading to incomplete or ambiguous assemblies. The advent and improvement of lo...

متن کامل

Assessment of Metagenomic Assembly Using Simulated Next Generation Sequencing Data

Due to the complexity of the protocols and a limited knowledge of the nature of microbial communities, simulating metagenomic sequences plays an important role in testing the performance of existing tools and data analysis methods with metagenomic data. We developed metagenomic read simulators with platform-specific (Sanger, pyrosequencing, Illumina) base-error models, and simulated metagenomes...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008